Img Source : https://i2.au.reastatic.net/800x600/4a991d2ad68caac22534dd9368493089cb544fac2efe20a7e8050bdc8ee2aedd/image.jpg
link: https://www.kaggle.com/datasets/dansbecker/melbourne-housing-snapshot
Melbourne, positioned in the southeastern part of Australia, is home to a dynamic real estate scene that mirrors the city’s diverse and evolving character. From the historic charm of Victorian-era homes nestled in inner suburbs to the modern skyline dominating the central business district, Melbourne’s real estate is a tangible reflection of its rich history and contemporary lifestyle. As an integral aspect of urban living, real estate in Melbourne serves as a lens through which one can observe the city’s economic vitality and changing societal preferences. The real estate market in Melbourne is not just about properties; it’s a narrative of the city’s past, present, and future.
This data was scraped from publicly available results posted every week from https://Domain.com.au. The dataset includes Address, Type of Real estate, Suburb, Method of Selling, Rooms, Price, Real Estate Agent, Date of Sale and distance from C.B.D.
Rooms: Number of rooms
Price: Price in dollars
Method: S - property sold; SP - property sold prior;
PI - property passed in; PN - sold prior not disclosed; SN - sold not
disclosed; NB - no bid; VB - vendor bid; W - withdrawn prior to auction;
SA - sold after auction; SS - sold after auction price not disclosed.
N/A - price or highest bid not available.
Type: br - bedroom(s); h - house,cottage,villa,
semi,terrace; u - unit, duplex; t - townhouse; dev site - development
site; o res - other residential.
SellerG: Real Estate Agent
Date: Date sold
Distance: Distance from CBD
Regionname: General Region (West, North West, North,
North east …etc)
Propertycount: Number of properties that exist in
the suburb.
Bedroom2 : Scraped of Bedrooms (from different
source)
Bathroom: Number of Bathrooms
Car: Number of carspots
Landsize: Land Size
BuildingArea: Building Size
CouncilArea: Governing council for the area
The First step is inserting the csv file into R located in data_input
and then installing the necessary plugins including dplyr,
ggplot2, and leaflet that will be used.
#Read data csv
melb <- read.csv("data_input/melb_data.csv")
library(dplyr) #install package dplyr to manipulate character data##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'ggplot2' was built under R version 4.3.2
Next, we will inspect the ‘melb’ folder and then remove duplicate rows based on the ‘Address’ column. Additionally, we will eliminate any unnecessary columns.
#Clean duplicates based on address
melb <- melb[!duplicated(melb$Address),]
# Removing unnecessary columns using select()
melb2 <- melb[,!names(melb) %in% c("Address", "Postcode", "Propertycount", "CouncilArea", "YearBuilt", "BuildingArea")]
# Removing NA Rows and "0"
melb2 <- melb2[melb2$Landsize != 0, ]After removing the unnecessary rows and columns, we will proceed to convert the data types to facilitate plotting. The firt column we convert is melb$Date using the “%d/%m/%Y”, and the rest as Factor and integer.
# Converting Date datatype
#Converting Date
melb2$Date <- as.Date(melb2$Date, format = "%d/%m/%Y")
#Converting as Factor & Integer
mutate(.data = melb2,
Suburb = as.factor(Suburb),
Method = as.factor(Method),
Type = as.factor(Type),
Regionname = as.factor(Regionname),
Bedroom2 = as.integer(Bedroom2),
Bathroom = as.integer(Bathroom),
Car = as.integer(Car),
Landsize = as.integer(Landsize)
)Next, we will rename the categories in the ‘Regionname’ and ‘Type’ columns to make them easier to read and eliminate any unnecessary duplicates.
#Renaming Categories for Regionname
melb2 <- melb2 %>%
mutate(Regionname = recode(Regionname, "Northern Victoria" = "Northern Metropolitan"),
Regionname = recode(Regionname, "Eastern Victoria" = "Eastern Metropolitan"),
Regionname = recode(Regionname, "Western Victoria" = "Western Metropolitan"))
#Renaming Categories for Type
melb2 <- melb2 %>%
mutate(Type = recode(Type, "h" = "Housing"),
Type = recode(Type, "u" = "Unit"),
Type = recode(Type, "t" = "Townhouse"))Calculate the price per square meter for each property in the dataset and then removes any outliers (values outside the 1.5*IQR range) from the ‘PricePerSQM’ column, resulting in a dataset with potentially more reliable data for analysis or visualization.
# Creating Price Per Squaremeter Column
melb2$PricePerSQM <- (melb2$Price / melb2$Landsize)
#Removing outlier
Q1 <- quantile(melb2$PricePerSQM, 0.25)
Q3 <- quantile(melb2$PricePerSQM, 0.75)
IQR <- Q3 - Q1
LB <- Q1 - 1.5*IQR
UB <- Q3 + 1.5*IQR
melb2 <- subset(melb2, PricePerSQM >= LB & PricePerSQM <= UB)According to the histogram below, the most frequent number of rooms is 3.
#Number of Rooms in Building
hist(melb2$Rooms, col = c('lightblue'), main = "Number of Rooms", xlab = "Rooms", density = 50)Based on the histogram below, the most frequently occurring distance to the CBD falls within the range of 5 to 15, with a median distance of 10.1.
#Distance
hist(melb2$Distance, col = c('orange'), main = "Melbourne Housing Distance", xlab = "Distance", density = 50)According to the table below, the suburb with the highest count of property sold is Reservoir with a total 160 properties.
#Top 10 suburb with the highest number of sold properties
freq_suburb <- table(melb2$Suburb) # Create frequency table
freq_suburb_df <- as.data.frame(freq_suburb) # Create data frame
colnames(freq_suburb_df) <- c("Suburb", "PropertiesSold") # Renaming columns
freq_suburb_df <- freq_suburb_df[order(-freq_suburb_df$PropertiesSold),]
top10_freq_suburb <- freq_suburb_df[1:10, ]
top10_freq_suburbAccording to the table below, the suburb with the highest average price per SQM is Princes Hill with a price of $7453.
# Top 10 Suburb with highest average of buying price per square meter
average_price <- melb2 %>%
group_by(Suburb) %>%
summarise(Average_Price_Per_Square_Meter = median(PricePerSQM, na.rm = TRUE)) # Creating a table
average_price <- average_price[order(-average_price$Average_Price_Per_Square_Meter),] # Sorting order
colnames(average_price) <- c("Suburb", "Average price Per SQM") # Renaming columns
top10_freq_suburb2 <- average_price[1:10, ]
top10_freq_suburb2boxplot(melb2$PricePerSQM~ melb2$Type,horizontal = F, col = c('orange'), main = "Melbourne Housing Price & Type", xlab = "Type", ylab = "Price Per SQM")# h - house,cottage,villa, semi,terrace
# u - unit, duplex
# t - townhouse; dev site - development siteThe boxplot indicates that the type of housing that has the highest median price per SQM is a Townhouse type, and then Housing, with a close third Unit type.
#Plotting Price PerSQM and Distance to CBD
#cor(melb2$Distance, melb2$PricePerSQM) = -0.4333912
ggplot(data = melb2, aes(x = Distance, y = PricePerSQM, color = Type)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Relations between Price and Distance", subtitle = "Melbourne Housing",
x = "Distance",
y = "Price per SQM"
) +
theme_minimal()## `geom_smooth()` using formula = 'y ~ x'
The plot illustrates a correlation between house distance and price per square meter. A pronounced negative correlation is observed for the Housing Type, indicating that shorter distances correspond to higher prices per square meter. The Townhouse type also exhibits a negative correlation, although it is not as robust as the Housing Type. Finally, the Unit type shows a positive correlation, suggesting that longer distances are generally associated with higher prices per square meter.
#Plotting Price PerSQM and number of rooms
boxplot(melb2$PricePerSQM~ melb2$Rooms,horizontal = F, col = c('orange'), main = "Melbourne Housing Rooms & Price", xlab = "Rooms", ylab = "Price Per SQM")
The data indicating that housing units with two rooms
command the highest median price per square meter
suggests an intriguing trend in the real estate market. This finding
prompts a deeper exploration into the factors influencing the perceived
value of properties with a specific room count.
#Plotting Price PerSQM and number of bedroom
boxplot(melb2$PricePerSQM~ melb2$Bedroom2,horizontal = F, col = c('orange'), main = "Melbourne Housing Bedroom & Price", xlab = "Bedroom", ylab = "Price Per SQM")The observation that housing units with two bedrooms command the highest median price per square meter unveils a significant trend in the real estate market. This finding prompts a closer examination of the factors contributing to the premium associated with two-bedroom units. It could be an indicative of possibly catering to young professionals, couples, or small families seeking functional yet compact living arrangements. While the single bedroom has the lowest price among all of them.
#Plotting Price PerSQM and number of bathroom
boxplot(melb2$PricePerSQM~ melb2$Bathroom,horizontal = F, col = c('orange'), main = "Melbourne Housing Bathroom & Price", xlab = "Bathroom", ylab = "Price Per SQM")The data reveals that housing units with four bathrooms exhibit the highest median price per square meter suggests a correlation between bathroom quantity and perceived property value, potentially indicating a market segment placing premium importance on luxurious amenities.However, the price starts to drop-off after it reached the number 6 where maximum is close to the IQR of the box.
#Plotting Price PerSQM and region
ggplot(data = melb2, aes(x = Regionname, y = PricePerSQM, color = Type)) +
geom_col(mapping = aes(fill=Type), position = "dodge") +
theme(
text = element_text(size = 7),# Adjust the font size
panel.background = element_rect(fill="white")
)
According to the barplot , it is observed that housing units located in
the Southern Metropolitan region have the
highest average price compared to other regions, while
the South-Eastern region has the lowest average
price. Typically, the housing type with the
highest cost is
Housing, followed by
Townhouse, and then Unit.
# Map
melbmap <- subset(melb2, melb2$Regionname == "Northern Metropolitan")
melbmap <- data.frame(lat=melbmap$Lattitude, # latitude
lng=melbmap$Longtitude) # longitude
set.seed(50)
melbmap <- melbmap[sample(nrow(melbmap), 25), ]
# get icon
ico <- makeIcon(
iconUrl = "https://cdn-icons-png.flaticon.com/512/25/25694.png",
iconWidth=41/2,
iconHeight=41/2
)
# create a leaflet map widget
map <- leaflet()
# add tiles from open street map
map <- addTiles(map = map)
# add markers
addMarkers(map = map, data = melbmap, icon = ico)📌 In Melbourne, some of the most expensive houses often feature two roomy bedrooms and an impressive four bathrooms. Princes Hill, located in the Northern Metropolitan area, is one place where you can find such luxurious properties.
DOT POINTS
The closer you are to the CBD, the higher the price
The most sold properties is located at Reservoir Suburb
Highest housing price in general: